Advanced Coarsening Schemes for Graph Partitioning* 



Ilya Safro 1 , Peter Sanders 2 , and Christian Schulz 2 

1 Mathematics and Computer Science Division, Argonne National Laboratory 

saf ro@mcs . anl . gov 

2 Karlsruhe Institute of Technology, Institute for Theoretical Informatics, Algorithmics II 
sander s@kit . edu, christian. schulz@kit . edu 



Abstract. The graph partitioning problem is widely used and studied in many practical and theoretical 
applications. The multilevel strategies represent today one of the most effective and efficient generic 
frameworks for solving this problem on large-scale graphs. Most of the attention in designing the 
t-H multilevel partitioning frameworks has been on the refinement phase. In this work we focus on the 

coarsening phase, which is responsible for creating structurally similar to the original but smaller 
graphs. We compare different matching- and AMG-based coarsening schemes, experiment with the 
!_h algebraic distance between nodes, and demonstrate computational results on several classes of graphs 

that emphasize the running time and quality advantages of different coarsenings. 

1 Introduction 
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{/y Graph partitioning is a class of problems used in many fields of computer science and engineering. Applica- 

tions include VLSI design, load balancing for parallel computations, network analysis, and optimal scheduling. 
The goal is to partition the vertices of a graph into a certain number of disjoint sets of approximately the 
same size, so that a cut metric is minimized. This problem is NP-complete even for several restricted classes 
of graphs, and there is no constant factor approximation algorithm for general graphs [>]. In this paper we 
focus on a version of the problem that constrains the maximum block size to (1 + e) times the average block 
size and tries to minimize the total cut size, namely, the number of edges that run between blocks. 

Because of the practical importance, many heuristics of different nature (spectral [19], combinatorial 
[10], evolutionist [4,24], etc.) have been developed to provide an approximate result in a reasonable (and, 
one hopes, linear) computational time. We refer the reader to [11,26,33] for more material. However, only 
the introduction of the general-purpose multilevel methods during the 1990s has provided a breakthrough 
in efficiency and quality. The basic idea can be traced back to multigrid solvers for solving elliptic partial 
differential equations [31] but more recent practical methods are based on mostly graph-theoretic aspects 
of, in particular, edge contraction and local search. Well-known software packages based on this approach 
include Jostle [33], Metis [20], DiBaP [ ], and Scotch [18]. 

A multilevel algorithm consists of two main phases: coarsening - where the problem instance is grad- 
ually mapped to smaller ones to reduce the original complexity (i.e., the graph underlying the problem is 
compressed), and uncoarsening - where the solution for the original instance is constructed by using the 
information inherited from the solutions created at the next coarser levels. So far, most of the attention 
in designing the multilevel partitioning frameworks has been on the uncoarsening phase. In this work we 
focus on the coarsening phase, which is responsible for creating graphs that are smaller than but structurally 
similar to the given graph. We compare different coarsening schemes, introduce new elements to them, and 
demonstrate computational results. For this purpose different coarsening schemes are integrated into the 
graph partitioning framework KaFFPa [25]. 

The paper is organized as follows. We begin in Section 2 by introducing notation and the multilevel ap- 
proach. In Section 3 we describe different coarsening schemes, including a novel algebraic, multigrid-inspired 
balanced coarsening scheme and matching-based coarsening schemes, as well as new measures for connectiv- 
ity. We present a large experimental evaluation in Section 4 on graphs arising in real-world applications and 
on graphs that are specifically designed to be hard for multilevel algorithms. 
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2 Preliminaries 

Consider an undirected graph G = (V, E, c, to) with edge weights 3 lj : E — >• M>o, node weights c : V — > K>o, 
n = \V\, and m = \E\. We extend c and to sets; in other words, c(V) := YlveV c ( v ) an< ^ w(E') :== 
SeGB' w ( e )- Here, E(v) := {u : {v, u} £ E} denotes the neighbors of v. We are looking for blocks of nodes 
Vi,. . . ,Vk that partition y, namely, Vi U •• • U Vk — V and ^ fl Vj = for i ^ j. The balancing constraint 
demands that Vi £ {l..fc} : c(Vi) < L max : = (l + e)c(V)/k + max ve v c(v) for some parameter e. The last term 
in this equation arises because each node is atomic and therefore a deviation of the heaviest node has to be 
allowed. The objective is to minimize the total cut J2i<j ^i^ij) where Eij := {{u, v} £ E : u £ Vi,v £ Vj}. 
A vertex v £ V, that has a neighbor w £ Vj,i ^ j, is a boundary vertex. We denote by nnzr(A, i) and 
nnzc(j4,i) the number of nonzero entries in the «th row or column of a matrix A, respectively. 

A matching M C E is a set of edges that do not share any common nodes; that is, the graph (V, M) 
has maximum degree one. Contracting an edge {u, v} means replacing the nodes u and v by a new node x 
connected to the former neighbors of u and v. We set c(x) = c(u) + c(v) so the weight of a node at each level 
is the number of nodes it is representing in the original graph. If replacing edges of the form {u. w},{v, w} 
would generate two parallel edges {x,w}, we insert a single edge with oj({x,w}) = ui({u, w}) + ui({v,w}). 
Uncontracting an edge e undoes its contraction. 



2.1 Multilevel Graph Partitioning 

In the multilevel framework we construct a hierarchy of decreasing-size graphs, Go, G\, . . . , Gk, by coarsening, 
starting from the given graph Go such that each next-coarser graph Gi reflects basic properties of the previous 
graph Gi-\. At the coarsest level Gk is partitioned by a hybrid of external solvers, and starting from the (fc — 
l)th level the solution is projected gradually (level by level) to the finest level. Each projection is followed by 
the refinement, which moves nodes between the blocks in order to reduce the size of the cut. This entire process 
is called a V-cycle (see Figure 1). KaFFPa [25] extended the concept of iterated multilevel algorithms which 

was introduced for graph partitioning by Walshaw et al. [ ] . 
('» • -P n '? The main idea is to iterate the multilevel-scheme using dif- 

V 4 4 • 4 4 4 ferent random seeds for coarsening and uncoarsening. This 

\ 4 \ n 4 \ 4 • 4 ensures non-decreased partition quality since the refinement 

Gk V V V V V algorithms of KaFFPa guarantee no worsening. In this pa- 

V— cycle F-cycle P er ' ^ or ^ e P ur P ose of comparison we consider also F-cycles 

[25] (see Figure 1) as a potentially stronger and slower ver- 

Fig. 1. V- and F-cycles schemes. sion °^ mu l tueve l framework for the graph partitioning 

problem. The detailed description of F-cycles for the multi- 
level graph partitioning framework can be found in [25]. 



3 Coarsening Schemes 

One of the most important concerns of multilevel schemes is a measure of the connection strength between 
vertices. For matching-based coarsening schemes, experiments indicate that more sophisticated edge rating 
functions are superior to edge weight as a criterion for the matching algorithm [ ] . To be more precise first 
the edges get rated using a rating function that indicates how much sense it makes to contract an edge. Then 
a matching algorithm tries to maximize the sum of the ratings of the edges to be contracted. The default 
configurations of KaFFPa employ the ratings 

expansion* 2 ({it, v}) :— uj({u,v}) 2 /c(u)c(v), and 
innerOuter({it, v}) :— u)({u, v}) / (Out(u) + Out(u) — 2uj(u, v)), 

where Out(w) := Ylxerfv) since they yielded the best results in [12]. 

3 Subscripts will be used for a short notation; i.e., toy corresponds to the weight of {i, j} £ E. 
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Algebraic distance for graph partitioning. The notion of algebraic distance introduced in [21,6] is based 
on the principle of obtaining low-residual error components used in the Bootstrap AMG [3] . When a priori 
knowledge of the nature of this error is not available, slightly relaxed random vectors are used to approximate 
it. This principle was used for linear ordering problems to distinguish between local and global edges [ ]. 
The main difference between the fc-partitioning problem and other (not necessarily combinatorial) problems 
for which the algebraic distance has been tested so far is the balancing constraints. For many instances, 
it is important to keep the coarsening balanced; otherwise, even though the structural information will be 
captured by a sophisticated coarsening procedure, most of the actual computational work that constructs 
the approximate solution will be done by the refinement iterations. Bounding the number of refinement 
iterations may dramatically decrease its quality. Thus, a volume-normalized algebraic distance is introduced 
to take into account the balancing of vertices. 

Given the Laplacian of a graph L = D — W, where W is a weighted adjacency matrix of a graph and 
D is the diagonal matrix with entries Da = Y) j LUij , we define its volume-normalized version denoted by 

L = D — W based on volume- normalized edge weights coy = uiij / c{i)c{j). We define an iteration matrix 
H for Jacobi over-relaxation (also known as a lazy random-walk matrix) as 

H = (1- ^I + aD^W, 

where < a < 1. The algebraic distance coupling is defined as 

r—1 

where x^'^ — H k x y0 ' r ^ is a relaxed randomly initialized test vector (i.e., X is a random vector sampled 
over [-1/2, 1/2]), R is the number of test vectors, and k is the number of iterations. In our experimental 
settings we set a = 0.5, R — 5, and k = 20. 

3.1 Coarsening 

To the best of our knowledge, the existing multilevel algorithms for combinatorial optimization problems 
(such as ^-partitioning, linear ordering, clustering, and segmentation) can be divided into two classes: 
contraction-based schemes [25,8,14] (including contractions of small subsets of nodes [ ?]) and algebraic multi- 
grid (AMG)-inspired schemes [21,13,27,20]. 

AMG-inspired coarsening. One of the most traditional approaches for derivation of the coarse systems 
in AMG is the Galerkin operator [31], which projects the fine system of equations to the coarser scale. In 
the context of graphs this projection is defined as 

L c = PLfP T , (1) 

where Lf and L c are the Laplacians of fine and coarse graphs Gf = (Vf, Ef) and G c = (V c , E c ), respectively. 
The («, J)th entry of projection matrix P represents the strength of the connection between fine node i and 
coarse node J. The entries of P, called interpolation weights, describe both the coarse-to-fine and fine-to- 
coarse relations between nodes. 

The coarsening begins by selecting a dominating set of (seed or coarse) nodes C C Vf such that all other 
(fine) nodes in F = Vf \ C are strongly coupled to C. This selection can be done by traversing all nodes and 
leaving node i in F (initially F = Vf, and C = 0) that satisfy 

£i//*i>e-Z)i/pii, (2) 

jec jeVf 
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where is a parameter of coupling strength. As in AMG-based approaches for linear ordering problems [22] 
we observed that the order in which Vf is traversed does play an important role in reducing the dependence 
on random seeds (for details on future volume ordering see [21]). 

The Galerkin operator construction differs from other AMG-based approaches for combinatorial optimiza- 
tion problems. Balancing constraints of the partitioning problem require a limited number of fine-to-coarse 
attractions between i € C (ith column in P) and its neighbors from F (nonzero entries in the ith column 
in P). In particular, this is important for graphs where the number of high-degree nodes in C is smaller 
than the number of parts in the desired partition. Another well-known problem of AMG that can affect the 
performance of the solver is the complexity of coarse levels. Consideration of the algebraic distance makes it 
possible to minimize the order of interpolation (the number of fractions a node from F can be divided to) 
to 1 or 2 only [ ]. Algorithm 1 summarizes the construction of P. 
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input : G, ieVf, P 
if i 6 C then 

L p nu) 

else 

I i— list of at most k algebraically strongest connections of i to C; 

{ei, e-i\ <— algebraically strongest pair of edges (according to p ei + p e2 ) in I such that the 
corresponding C-neighbors are not over-loaded if i is divided between them; 
if {ei,e 2 } / then 
|_ I «- {ei,ea} 

else 

ei algebraically strongest connection of i to C such that the corresponding C-neighbor is not 
over-loaded if i is aggregated with it; 
I «- {ei}; 

if / is empty then 
[^ move i to C 

else 

N£ C-neighbors of i that adjacent to edges in I; 
PiiU) <- ViPH ■ Ek€Arf VPife) for 3 G Nt; 
update future volumes of j € iVf; 



Algorithm 1: Interpolation weights for P 



Algorithm 1 can be viewed as simplified version of bootstrap AMG [3] with the additional restriction 
on future volume of aggregates and adaptive interpolation order. Pu(j) thus represents the likelihood of i 
belonging to the I(j)th aggregate. The edge connecting two coarse aggregates p and q is assigned with the 
weight w pq = X^fe^z PkpWkiPiq- The volume of the pih coarse aggregate is J2j c ti)Pjp- We emphasize the 
property of adaptivity of C (line 15 in Algorithm 1), which is updated if the balancing of aggregates is not 
satisfied. 

We mention the difference between our AMG scheme and the weighted aggregation (WAG) scheme in 
7]. The common principle that works in both schemes is based on the division of F- nodes between their 
C-neighbors. However, two critical components are missing in [7]: (1) the algebraic distance that forms both 
the set of seeds and the interpolation operator; and (2) the weight-balancing algorithmic component when 
aggregates are created, namely, operator P in [7] is created as in classical AMG schemes. One important 
disadvantage of [7] is a relatively high density of coarse levels, which is eliminated with introduction of 
the algebraic distance. This was achieved by reducing the order of interpolation to 1 or 2. The balancing 
factor played an important role in reducing the running time of the algorithm. Recently introduced max- 
flow/min-cut refinement leads to noticeably better results than FM/KL heuristics (explained in Section 
3.3). In contrast to simple FM/KL swaps, however, its complexity becomes higher if the aggregates are 
unbalanced with respect to the maximum size of one part. Applying this refinement with unbalanced WAG 
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can significantly increase the total running time of the solver or lead to weak solutions if the refinement 
is terminated before it finds a good local minimum. Overall, the performance of our AMG scheme differs 
significantly from what we observed with WAG. 

Matching based coarsening. Another coarsening framework, which is more popular because of its sim- 
plicity and faster performance, is the matching based scheme. In this scheme a coarse graph is constructed 
by using contractions derived from a preprocessed edge matching. This scheme represents a special case of 
PLfP T in which nnzr(P, r) = 1 for all rows r in P, and 1 < nnzc(P, c) < 2 for all columns c in P. 

Global Paths Algorithm. The Global Paths Algorithm (GPA), was proposed in [16] as a synthesis of Greedy 
and Path Growing algorithms [ ] . Similar to the Greedy approach, GPA scans the edges in order of decreasing 
weight (or rating) ; but rather than immediately building a matching, it first constructs a collection of paths 
and even length cycles. To be more precise, these paths initially contain no edges. While scanning the edges, 
the set is then extended by successively adding applicable edges. An edge is called applicable if it connects 
two endpoints of different paths or the two endpoints of an odd length path. Afterwards, optimal solutions 
are computed for each of these paths and cycles using dynamic programming. KaFFPaStrong [25] employs 
innerOuter on the first level of the hierarchy since expansion* 2 evaluates to one on unweighted graphs. 
Afterwards it uses expansion* 2 . 

RandomGPA Algorithm. This algorithm is used by the classic KaFFPaEco configuration. It is a synthesis 
of the most simple random matching algorithm and the GPA algorithm. To be more precise this matching 
algorithm is dependent of the number of blocks the graph has to be partitioned in. It matches the first 
max{2, 7 — log k} levels using the random matching algorithm and switches to the GPA algorithm afterwards. 
The random matching algorithm traverses the nodes in a random order and if the current node is not already 
matched it chooses a random unmatched neighbor for the matching. KaFFPaEco employs expansion* 2 as a 
rating function as soon as it uses GPA. 

3.2 The Coarsest Level 

Contraction is stopped when the graph is small enough to be partitioned by some other expensive algorithm. 
We use the same initial partitioning scheme as in KaFFPa [25], namely, the libraries Scotch and Metis for 
initial partitioning. For AMG, some modifications have to be made since Scotch and Metis cannot deal with 
fractional numbers and Metis expects Wy > 1. To overcome this implementational problem, we perform the 
following two steps. First, we divide each edge weight of the coarsest graph by the smallest edge weight that 
occurred on that level. This step assures edge weights larger than or equal to one without skewing the graph 
partitioning problem for the library used. Second, we get rid of the fractional edge weights using randomized 
rounding. Let e € E be an edge with fractional edge weight. We then obtain an integer edge weight w(e) by 
flipping a coin with probabilities "P(head) = to(e) — [oj(e)\ , 'P(tail) = 1 — T'(head). In the case of heads we 
set the edge weight <D(e) to |~w(e)~|; otherwise we set it to [oj (e) J . This way we can assure that the value of 
the cut in the graph G ~ (Vfe, Ek,ti) produced by the external initial partitioning algorithm is close to the 
real cut value in G. 

3.3 Uncoarsening 

Recall that uncoarsening undoes contraction. For AMG-based coarsening this means that fine nodes have 
to be assigned to blocks of the partition of the finer graph in the hierarchy. We assign a fine node v to the 
block that minimizes cuts • pb(v), where cuts is the cut after v would be assigned to block B and pb(v) is 
a penalty function to avoid blocks that are heavily overloaded. To be more precise, after some experiments 
we fixed the penalty function to pb(v) = 2 max ' 0,100 J- max \ where L max is the upper bound for the block 
weight. Note that slight imbalances (e.g. overloaded blocks), can usually be fixed by the refinement algorithms 
implemented within KaFFPa. For matching-based coarsening the uncoarsening is straightforward: a vertex 
is assigned to the block of the corresponding coarse vertex. 
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Karlsruhe Fast Flow Partitioner (KaFFPa). Since we integrated different coarsening schemes into the multi- 
level graph partitioner KaFFPa [25] , we now briefly outline the techniques KaFFPa uses during uncoarsening. 
After a matching is uncontracted, local search-based refinement algorithms move nodes between block bound- 
aries in order to reduce the cut while maintaining the balancing constraint. Local improvement algorithms 
are usually variants of the FM algorithm [10]. The variant KaFFPa uses is organized in rounds. In each 
round, a priority queue P is used that is initialized with all vertices that are incident to more than one block, 
in a random order. The priority is based on the gain g(i) = maxp gp{i) where gp(i) is the decrease in edge 
cut when moving i to block P. Local search then repeatedly looks for the highest gain node v and moves it 
to the corresponding block that maximizes the gain. However, in each round a node is moved at most once. 
After a node is moved, its unmoved neighbors become eligible, i.e. its unmoved neighbors are inserted into 
the priority queue. When a stopping criterion is reached, all movements to the best-found cut that occurred 
within the balance constraint are undone. This process is repeated several times until no improvement is 
found. 

Max-Flow Min-Cut Local Improvement. During the uncoarsening phase KaFFPa additionally uses more 
advanced refinement algorithms. The first method is based on max-flow min-cut computations between pairs 
of blocks, in other words, a method to improve a given bipartition. Roughly speaking, this improvement 
method is applied between all pairs of blocks that share a nonempty boundary. The algorithm basically 
constructs a flow problem by growing an area around the given boundary vertices of a pair of blocks such 
that each s-t cut in this area yields a feasible bipartition of the original graph/pair of blocks within the 
balance constraint. One can then apply a max-flow min-cut algorithm to obtain a min-cut in this area and 
therefore a nondecreased cut between the original pair of blocks. This can be improved in multiple ways, for 
example, by iteratively applying the method, searching in larger areas for feasible cuts, and applying most 
balanced minimum cut heuristics. For more details we refer the reader to [25]. 

Multi-try FM. The second method for improving a given partition is called multi-try FM. This local im- 
provement method moves nodes between blocks in order to decrease the cut. Previous fc-way methods were 
initialized with all boundary nodes, i.e., all boundary nodes were eligible for movement at the beginning. 
Roughly speaking, the multi-try FM algorithm is a fc-way improvement method that is initialized with a 
single boundary node, thus achieving a more localized search. This is repeated several rounds. For more 
details about the multi-try FM algorithm we refer the reader to [25]. 

4 Experimental Evaluation 

Configurations of KaFFPa. The AMG coarsening was implemented separately based on the coarsening for 
linear ordering solvers from [ 2 ] and was integrated into KaFFPa [25]. The computational experiments have 
been performed with six configurations of KaFFPa, which are presented in Table 1. All configurations use 
the described FM algorithm and flows for the refinement. The strong configurations further employ flows 
using larger areas, multi-try FM and F-cycles. A detailed description of the refinement configurations can be 
found in [ ]. Throughout this section, because of the respective similar running times, we concentrate on 
two groups of comparison: for fast versions (AMG-ECO, ECO, ECO-ALG) and for strong versions (AMG, 
STRONG, F-CYCLE). To be precise, usually the running time of F-CYCLE is bigger than that of STRONG 
and AMG. However, the running time gap between fast and strong versions is even more significant on 
average. Since the main goal of this paper is to introduce the AMG coarsening with different uncoarsening 
configurations, most of the comparisons will be of type AMG vs respective non-AMG ratios. A comprehensive 
comparison of the F-CYCLE and the STRONG configuration can be found in [25]. 

All experiments are performed with fixed imbalance factor 3%. We also checked other small values, 
namely, 0%, 1%, and 5%; however, no significant difference in the comparison of the respective methods was 
observed. 
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ECO 


Represents the classical KaFFPaEco configuration, a good trade-off of partition quality and runtime. 


ECO-ALG 


Same refinement as in ECO, coarsening uses the GPA algorithm at each level and the edge rating 
function employs algebraic distances; i.e., it uses the rating function ex_alg(e) := expansion* 2 (e)/p e . 


F-CYCLE 


Represents the classical KaFFPaStrong configuration using strong refinement schemes and the F-cycle 
scheme, with the purpose of achieving high partition quality; this configuration achieved the best known 
partitions for many instances from Benchmark I in 2010 [25]. 


STRONG 


Uses the same refinement and matching schemes as in the F-CYCLE configuration; however, here only 
one single V-cycle is performed. 


AMG-ECO 


AMG coarsening based on algebraic distances with interpolation order at most 2, refinement as in ECO. 


AMG 


Same coarsening as in AMG-ECO, same refinement as in STRONG. 



Table 1. Description of the six configurations used for the computational experiments. 



Benchmark I: Walshaw's Partitioning Archive. Chris Walshaw's benchmark archive [2s] is a collection of 
real-world instances for the partitioning problem. The rules used there imply that the running time is not 
an issue, but one wants to achieve minimal cut values for k 6 {2, 4, 8, 16, 32, 64} and balance parameters 
e € {0,0.01,0.03,0.05}. It is the most used graph partitioning benchmark in the literature. Most of the 
graphs of the benchmark come from finite-clement applications; however, there are also some graphs from 
VLSI design and a road network. Over the years many different heuristics have been tested and adjusted on 
this benchmark, so that many heuristics are able to obtain good results on these graphs. 

In Figures 2 we present the results of the comparison of the algorithms on these graphs for different 
numbers of blocks k. The horizontal axes represent ordered graphs from the benchmark (however, the ordering 
itself will be different for each curve). The vertical axes are for ratios that represent the comparison of averages 
of final results for a pair of methods. Each figure contains four curves. Each curve correspond to a comparison 
of the following pairs of methods: ECO vs. AMG-ECO, ECO-ALG vs. AMG-ECO, STRONG vs. AMG, and 
F-CYCLE vs. AMG. Each point on the curves corresponds to the ratio between the average over 10 runs of 
one method and the average over 10 runs of another method. Each run depends on different random seeds 
and, thus, can produce different results. For example, the last point at the black solid curve in Figure 2a has 
value 2.03, which means that 

average(ECO final cut given seed s\, ■ ■ ■ , ECO final cut given seed sio) ^ gg 

average(AMG-ECO final cut given seed s%, ■ ■ ■ , AMG-ECO final cut given seed sio) 

in experimental series for k = 2. A comparison of the running time for uncoarsening phases is presented in 
Figure 3. Each point on the curves in Figure 3 corresponds to a ratio of uncoarsening running times of two 
methods. We observed that uncoarsening performance of fast versions (ECO, ECO-ALG, AMG-ECO) are 
more or less similar to each other. The uncoarsening of a STRONG V-cycle is somewhat slower than AMG 
because of the density of coarse levels. The averages are summarized in Table 2. Full results are summarized 
in [23]. 



k 


ECO/ECO-ALG 


ECO-ALG /ECO- AMG 


STRONG/ AMG 


F-CYCLE/ AMG 


2 


1.026 


1.034 


1.013 


1.012 


4 


1.053 


1.021 


1.009 


1.004 


8 


1.019 


1.023 


0.998 


0.995 


16 


1.015 


1.012 


1.001 


0.999 


32 


1.008 


1.017 


1.003 


1.002 


64 


1.004 


1.009 


1.000 


0.997 



Table 2. Computational comparison for Benchmark I. Each number corresponds to the ratio of averages of final cuts 
for pair of methods in the column title and k given in the row. 

Benchmark II: Scale-free networks. In scale-free networks the distribution of vertex degrees asymptotically 
follows the power-law distribution. Examples of such networks include WWW links, social communities, and 
biological networks. These types of networks often contain irregular parts and long-range links (in contrast to 
Benchmark I) that can confuse both contraction and AMG coarsening schemes. Since Walshaw's benchmark 
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(e) k = 32 



(f ) k = 64 



Fig. 2. Comparison of coarsening schemes on Walshaw's benchmark graphs. Figures (a)-(f) contain results of compari- 
son for k = 2, 4, 8, 16, 32, and 64, respectively. Each figure contains four curves that correspond to ECO/AMG-ECO, 
ECO-ALG/AMG-ECO, STRONG/AMG, and F-CYCLE/AMG ratios, respectively. Each point on a curve corre- 
sponds to the ratio related to one graph. 
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Fig. 3. Comparison of uncoarsening running time on Walshaw's benchmark graphs for k — 32. The figure contains 
four curves that correspond to ECO/AMG-ECO, ECO-ALG/AMG-ECO, STRONG/AMG, and F-CYCLE/AMG 
ratios, respectively. Each point on curves correspond to the ratio related to one graph. 



doesn't contain graphs derived from such networks, we evaluate our algorithms on 15 graphs collected from 
[1,15]. Full information about these graphs, along with the computational results, is available at [23]. 

The results of the comparison on scale-free graphs are presented in Figure 4. Because of the large running 
time of the strong configurations on these graphs, we compare only the fast versions of AMG and matching- 
based coarsening. Each figure corresponds to a different number of blocks k. The horizontal axes represent 
graphs from the benchmark. The vertical axes are for ratios that represent comparison of averages of final 
results for a pair of methods. Each graph corresponds to one quadruple of bars. First, second, third and 
fourth bars represent averages of ratios ECO/AMG-ECO, ECO-ALG/AMG-ECO after finest refinement, 
ECO/AMG-ECO, ECO-ALG/AMG-ECO before finest refinement, respectively. As in the previous case the 
averages are calculated over 10 runs. 



k 


ECO 


ECO 


ECO 


ECO-ALG 


ECO-ALG 


ECO-ALG 

quality 


ECO-ALG 

full time 


ECO-ALG 

uncoarsening time 


AMG-ECO 

quality 


AMG-ECO 

uncoarsening time 


2 


1.38 


0.77 


1.62 


1.16 


3.62 


4 


1.24 


1.32 


1.85 


1.11 


2.14 


8 


1.15 


1.29 


1.45 


1.07 


1.94 


16 


1.09 


1.27 


1.33 


1.06 


1.69 


32 


1.06 


1.18 


1.23 


1.00 


1.60 


64 


1.06 


1.13 


1.13 


1.01 


2.99 



Table 3. Computational comparison for scale-free graphs. 



Benchmark III: Potentially Hard Graphs for Fast k -partitioning Algorithms. Today multilevel strategies 
represent one of the most effective and efficient generic frameworks for solving the graph partitioning problem 
on large-scale graphs. The reason is obvious: given a successful global optimization technique X for this 
problem, one can consider applying it locally by introducing a chain of subproblems along with fixed boundary 
conditions. Given this and if the coarsening preserves the structural properties of the graph well enough , the 
multilevel heuristic can behave better and work faster than a direct global application of the optimization 
technique X. Examples of such combinations include FM/KL, spectral and min-cut /max-flow techniques 
with multilevel frameworks. When can the multilevel framework produce low quality results? 

We present a simple strategy for checking the quality of multilevel schemes. To construct a potentially 
hard instance for gradual multilevel projections, we consider a mixture of graphs that are weakly connected 
with each other. These graphs have to possess different structural properties (such as finite-element faces, 
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1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(a) k = 2 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(b) k = 4 




2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(c) k = 8 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(d) k = 16 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(e) k = 32 



I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(f) k = 64 



Fig. 4. Comparison of coarsening schemes on scale-free graphs. Figures (a)-(f) contain results of comparison for k — 2, 
4, 8, 16, 32, and 64, respectively. Each quadruple of bars correspond to one graph. First, second, third and fourth 
bars represent averages of ratios ECO/AMG-ECO, ECO-ALG/AMG-ECO after refinement, ECO/AMG-ECO, and 
ECO-ALG/AMG-ECO before refinement, respectively. Three exceptionally high ratios on both Figures are between 
2.1 and 3. 
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ECO 


ECO 


ECO-ALG 


ECO-ALG 




STRONG 


F-C YCLE 




ECO-ALG 


ECO-ALG 


AMG-ECO 


AMG-ECO 


AMG 


AMG 


AMG 




quality 


full 


quality 


uncoarsening 


quality 


uncoarsening 


quality 


k 




time 




time 




time 




2 


1.42 


0.51 


1.18 


0.55 


1.15 


2.11 


1.11 


4 


1.15 


0.88 


1.23 


0.64 


1.13 


1.69 


1.12 


8 


1.12 


1.08 


1.08 


0.98 


1.05 


1.37 


1.04 



Table 4. Computational comparison for potentially hard graphs. 



power-law degree distribution, and density) to ensure nonuniform coarsening and mutual aggregation of 
well-separated graph regions. Such mixtures of structures may have a twofold effect. First, they can force 
the algorithm to contract incorrect edges; and second, they can attract a "too strong" refinement to reach 
a local optimum, which can contradict better local optimums at finer levels. The last situation has been 
observed in different variations also in multilevel linear ordering algorithms [22]. In other words, the uneven 
aggregation with respect to the scales (not to be confused with uneven sizes of clusters) can lead refinement 
algorithms to wrong local attraction basins. Examples of graphs that contain such mixtures of structures 
include multi-mode networks [30] and logistics multi-stage system networks [29]. In general, such graphs can 
be difficult not only to the multilevel algorithms. 

We created a benchmark (available at [ ]) with potentially hard mixtures. Each graph in this benchmark 
represents a star-like structure of different graphs Sq,. . . ,S t - Graphs Si,..., St are weakly connected to the 
center Sq by random edges. Since all the constituent graphs are sparse, a faster aggregation of them has been 
achieved by adding more than one random edge to each boundary node. The total number of edges between 
each Si and So was less than 3% out of the total number of edges in Si. We considered the mixtures of the 
following structures: social networks, finite-element graphs, VLSI chips, peer-to-peer networks, and matrices 
from optimization solvers. 

The comparison on this benchmark is demonstrated in Figure 5. Each graph corresponds to one quadruple 
of bars. The first, second, third and the fourth bar represent averages over 10 ratios of ECO/AMG-ECO, 
ECO-ALG/AMG-ECO, STRONG/ AMG, and F-cycle/AMG, respectively. In almost all experiments we 
observed that introduction of algebraic distance as a measure of connectivity plays a crucial role in both fast 
versions AMG-ECO and ECO-ALG since it helps to separate the subgraphs and postpone their aggregation 
into one mixture. We also observe that both fast and slow AMG coarsenings almost always lead to better 
results. Note that in contrast to Benchmarks I and II, the uncoarsening of ECO-ALG is significantly faster 
than that of AMG-ECO. 

Role of the algebraic distance. In this work the importance of the algebraic distance as a measure of con- 
nectivity strength for graph partitioning algorithms has been justified in almost all experimental settings. 
In particular, the most significant gap was observed between ECO and ECO-ALG (see all benchmarks), 
versions which confirms preliminary experiments in [ ], where the algebraic distance has been used at the 
finest level only. The price for improvement in the quality is the additional running time for Jacobi over- 
relaxation, which can be implemented by using the most suitable (parallel) matrix-vector multiplication 
method. However, in cases of strong configurations and/or large irregular instances, the difference in the 
running time becomes less influential as it is not comparable to the amount of work in the refinement phase. 
For example, for the largest graph in Benchmark I (auto, \V\ = 448695, = 3314611) the ECO coarsening 
is approximately 10 times faster than that in the ECO-ALG; but for both configurations when k = 64, it 
takes less than 3% of the total time. Note that for irregular instances from Benchmark II, already starting 
k — 4 the total running time for ECO becomes bigger than in ECO-ALG (see Table 3). More examples of 
trade-off between changes in the objectives and those in the running times on Benchmark III are presented 
in Figure 6. 

Does AMG coarsening help? The positive answer to this question is given mostly by Benchmarks II and III, 
which contain relatively complex instances (Tables 3 and 4). On Benchmark III we have demonstrated that 
the AMG configuration is superior to F-CYCLE, which runs significantly longer. This result is in contrast 
to Benchmark I, in which we did not observe any particular class of graphs that corresponded to stable 
significant difference in favor of one of the methods in pairs ECO-ALG vs AMG-ECO and STRONG vs 
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1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(a) k = 2 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(b) k = 2, before last refinement 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(c) k = 4 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(d) k = 4, before last refinement 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(e) k = 8 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Graphs ordered by ratio ECO-ALG/AMG-ECO 

(f) k = 8, before last refinement 



Fig. 5. Comparison of coarsening schemes on hard examples. Figures (a,c,e) contain results of comparison before 
applying finest level refinement. Figure (b,d,f) contain results of comparison of final results. Each quadruple of bars 
correspond to one graph. First, second, third and fourth bars represent averages of ratios ECO/AMG-ECO, ECO- 
ALG/AMG-ECO, STRONG / AMG, and F-cycle/AMG, respectively. Four exceptionally high ratios on both Figures 
are between 3.5 and 5.7. 
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Fig. 6. Benchmark III: Trade-off between changes in the objectives (horizontal axis) and those in the running times 
(vertical axis) on Benchmark III. Data points for k = 2, 4, and 8 are represented by circles, squares, and triangles, 
respectively. Average ratios are calculated each over 10 runs similarly to previous figures. The left and right figures 
describe the comparison for ECO vs. ECO-ALG and ECO-ALG vs. STRONG configurations, respectively. 

AMG. However, we note that in both Benchmarks I and II several graphs exhibited that AMG versions 
yield to the respective matching for large k. The problem is eliminated when we stabilize p by using more 
relaxations according to Theorem 4.2 in [21]. We cannot present here the exact comparison of coarsening 
running times because their underlying implementations are very different. Theoretically, however, if in both 
matching and AMG configurations the algebraic distance is used and when the order of interpolation in 
AMG is limited by 2 (and usually it is 1, meaning that the coarse graphs are not dense like in [7]), the exact 
complexity of AMG coarsening is not supposed to be bigger than that of matching. 



5 Conclusions 

We introduced a new coarsening scheme for multilevel graph partitioning based on the AMG coarsening. 
One of its most important components, namely, the algebraic distance connectivity measure, has been in- 
corporated into the matching coarsening schemes. Both coarsening schemes have been compared under fast 
and strong configurations of refinement. In addition to known benchmarks, we introduced new potentially 
hard graphs for large-scale graph partitioning solvers (available at [ ]). As the main conclusion of this work, 
we emphasize the success of the proposed AMG coarsening and the algebraic distance connectivity mea- 
sure between nodes demonstrated on highly irregular instances. One has to take into account the trade-off 
between increased running time when using algebraic distance and improved quality of the partitions. The 
increasing running time becomes less tangible with growth of graph size compared with the complexity of 
the refinement phase. 

Many opportunities remain to improve the coarsening schemes for graph partitioning. We demonstrated 
the crucial importance of the connectivity strength metrics (especially for fast versions of the algorithm) 
which raises the question how one can use these metrics at the uncoarsening phase. Preliminary experiments 
show that this has the potential to improve fast versions even more. Another issue that requires more insight 
is related to the balancing of AMG aggregates. We observed a number of examples for which the unbalanced 
coarsening produces noticeably better results. 
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